17 research outputs found
Testing Properties of Multiple Distributions with Few Samples
We propose a new setting for testing properties of distributions while
receiving samples from several distributions, but few samples per distribution.
Given samples from distributions, , we design
testers for the following problems: (1) Uniformity Testing: Testing whether all
the 's are uniform or -far from being uniform in
-distance (2) Identity Testing: Testing whether all the 's are
equal to an explicitly given distribution or -far from in
-distance, and (3) Closeness Testing: Testing whether all the 's
are equal to a distribution which we have sample access to, or
-far from in -distance. By assuming an additional natural
condition about the source distributions, we provide sample optimal testers for
all of these problems.Comment: ITCS 202
A Concentration Inequality for the Facility Location Problem
We give a concentration inequality for a stochastic version of the facility
location problem on the plane. We show the objective is
concentrated in an interval of length and if the input consists of i.i.d. uniform points in the
unit square. Our main tool is to use a suitable geometric quantity, previously
used in the design of approximation algorithms for the facility location
problem, to analyze a martingale process.Comment: 6 pages, 1 figur
Property Testing of LP-Type Problems
Given query access to a set of constraints S, we wish to quickly check if some objective function ? subject to these constraints is at most a given value k. We approach this problem using the framework of property testing where our goal is to distinguish the case ?(S) ? k from the case that at least an ? fraction of the constraints in S need to be removed for ?(S) ? k to hold. We restrict our attention to the case where (S,?) are LP-Type problems which is a rich family of combinatorial optimization problems with an inherent geometric structure. By utilizing a simple sampling procedure which has been used previously to study these problems, we are able to create property testers for any LP-Type problem whose query complexities are independent of the number of constraints. To the best of our knowledge, this is the first work that connects the area of LP-Type problems and property testing in a systematic way. Among our results are property testers for a variety of LP-Type problems that are new and also problems that have been studied previously such as a tight upper bound on the query complexity of testing clusterability with one cluster considered by Alon, Dar, Parnas, and Ron (FOCS 2000). We also supply a corresponding tight lower bound for this problem and other LP-Type problems using geometric constructions
Smoothed Analysis of the Condition Number Under Low-Rank Perturbations
Let be an arbitrary by matrix of rank . We study the
condition number of plus a \emph{low-rank} perturbation where
are by random Gaussian matrices. Under some necessary assumptions, it
is shown that is unlikely to have a large condition number. The main
advantages of this kind of perturbation over the well-studied dense Gaussian
perturbation, where every entry is independently perturbed, is the cost
to store and the increase in time complexity for performing the
matrix-vector multiplication . This improves the space
and time complexity increase required by a dense perturbation, which is
especially burdensome if is originally sparse. Our results also extend to
the case where and have rank larger than and to symmetric and
complex settings. We also give an application to linear systems solving and
perform some numerical experiments. Lastly, barriers in applying low-rank noise
to other problems studied in the smoothed analysis framework are discussed
Faster Linear Algebra for Distance Matrices
The distance matrix of a dataset of points with respect to a distance
function represents all pairwise distances between points in induced by
. Due to their wide applicability, distance matrices and related families of
matrices have been the focus of many recent algorithmic works. We continue this
line of research and take a broad view of algorithm design for distance
matrices with the goal of designing fast algorithms, which are specifically
tailored for distance matrices, for fundamental linear algebraic primitives.
Our results include efficient algorithms for computing matrix-vector products
for a wide class of distance matrices, such as the metric for which we
get a linear runtime, as well as an lower bound for any algorithm
which computes a matrix-vector product for the case, showing a
separation between the and the metrics. Our upper
bound results, in conjunction with recent works on the matrix-vector query
model, have many further downstream applications, including the fastest
algorithm for computing a relative error low-rank approximation for the
distance matrix induced by and functions and the fastest
algorithm for computing an additive error low-rank approximation for the
metric, in addition to applications for fast matrix multiplication
among others. We also give algorithms for constructing distance matrices and
show that one can construct an approximate distance matrix in time
faster than the bound implied by the Johnson-Lindenstrauss lemma.Comment: Selected as Oral for NeurIPS 202
Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering
Random dimensionality reduction is a versatile tool for speeding up
algorithms for high-dimensional problems. We study its application to two
clustering problems: the facility location problem, and the single-linkage
hierarchical clustering problem, which is equivalent to computing the minimum
spanning tree. We show that if we project the input pointset onto a random
-dimensional subspace (where is the doubling dimension of
), then the optimum facility location cost in the projected space
approximates the original cost up to a constant factor. We show an analogous
statement for minimum spanning tree, but with the dimension having an extra
term and the approximation factor being arbitrarily close to .
Furthermore, we extend these results to approximating solutions instead of just
their costs. Lastly, we provide experimental results to validate the quality of
solutions and the speedup due to the dimensionality reduction. Unlike several
previous papers studying this approach in the context of -means and
-medians, our dimension bound does not depend on the number of clusters but
only on the intrinsic dimensionality of .Comment: 25 pages. Published as a conference paper in ICML 202